Welcome to the SIB Days 2020 - virtual conference Spatial Transcriptomics workshop by 10x genomics!
The purpose of this tutorial will be to walk users through some of the steps necessary to explore data produced by the 10x Genomics Visium Spatail Gene Expression Solution and the Spaceranger pipeline. We will investigate the datasets whith are all freely available from 10x Genomics.
Things to know about this workshop
/mnt/libs/shared_data/library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
library(RColorBrewer)
Real Dataset for the tutorial
breast_cancer_1 <- Load10X_Spatial(data.dir = "/mnt/libs/shared_data/human_breast_cancer_1/outs/",
filename = "V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5")
breast_cancer_2 <- Load10X_Spatial(data.dir = "/mnt/libs/shared_data/human_breast_cancer_2/outs/",
filename = "V1_Breast_Cancer_Block_A_Section_2_filtered_feature_bc_matrix.h5")
Same data just internal to 10x
breast_cancer_1 <- Load10X_Spatial(data.dir = "/mnt/analysis/marsoc/pipestances/HWHTFDSXX/SPATIAL_RNA_COUNTER_PD/163086/HEAD/outs/", slice = "slice1")
breast_cancer_2 <- Load10X_Spatial(data.dir = "/mnt/analysis/marsoc/pipestances/HWHTFDSXX/SPATIAL_RNA_COUNTER_PD/163087/HEAD/outs/", slice = "slice2")
Let’s merge the data together
breast_cancer <- merge(breast_cancer_1, breast_cancer_2)
There are a bunch of datasets hoted by the Satija lab in the Seurat Data Package.
Let’s have a look at some basic QC information. Keep in mind that most seurat plots are ggplot object and can be manipulated as such.
Counts = UMI Features = Genes
Spaceranger does normiliaztion for clustering and DE but does not return that normalized matrix
Pre-normalization Raw UMI counts
SpatialFeaturePlot(breast_cancer, features = c("ERBB2", "CD8A"))
SE transform
Don’t worry about reachediteration limit warnings. See https://github.com/ChristophH/sctransform/issues/25 for discussion
Default assay will now be set to SCT
breast_cancer <- SCTransform(breast_cancer, assay = "Spatial", verbose = TRUE)
SpatialFeaturePlot(breast_cancer, features = c("ERBB2", "CD8A"))
From Seurat:
The default parameters in Seurat emphasize the visualization of molecular data. However, you can also adjust the size of the spots (and their transparency) to improve the visualization of the histology image, by changing the following parameters:
pt.size.factor- This will scale the size of the spots. Default is 1.6
alpha - minimum and maximum transparency. Default is c(1, 1).
Try setting to alpha c(0.1, 1), to downweight the transparency of points with lower expression
p1 <- SpatialFeaturePlot(breast_cancer, features = "TTR", pt.size.factor = 1)+
theme(legend.position = "right") +
ggtitle("Actual Spot Size")
p2 <- SpatialFeaturePlot(breast_cancer, features = "TTR")+
theme(legend.position = "right") +
ggtitle("Scaled Spot Size")
p1 + p2 + plot_annotation(
title = 'Actual Spot Size (left), Scaled Spot Size (right)'
)
Dimensionality reduction, clustering, and visualization
We can then proceed to run dimensionality reduction and clustering on the RNA expression data, using the same workflow as we use for scRNA-seq analysis.
Some of these processes can be parallized
library(future)
# check the current active plan
plan()
# change the current plan to access parallelization
plan("multiprocess", workers = 4)
plan()
multiprocess:
- args: function (expr, envir = parent.frame(), substitute = TRUE, lazy = FALSE, seed = NULL, globals = TRUE, workers = 4, gc = FALSE, earlySignal = FALSE, label = NULL, ...)
- tweaked: TRUE
- call: plan("multiprocess", workers = 4)
The defalut UMAP calculation is performed with the R-based UWOT library However, you can run UMAP in python via retuculate library and umap-learn. We have found that for smaller datasets (<= 10k cells/spots) UWOT is great. For much larger datasets (100k + cells/spots) umap-learn can be a faster option.
breast_cancer <- RunPCA(breast_cancer, assay = "SCT", verbose = FALSE)
breast_cancer <- FindNeighbors(breast_cancer, reduction = "pca", dims = 1:30)
Computing nearest neighbor graph
Computing SNN
breast_cancer <- FindClusters(breast_cancer, verbose = FALSE)
breast_cancer <- RunUMAP(breast_cancer, reduction = "pca", dims = 1:30)
15:07:52 UMAP embedding parameters a = 0.9922 b = 1.112
15:07:52 Read 7834 rows and found 30 numeric columns
15:07:52 Using Annoy for neighbor search, n_neighbors = 30
15:07:52 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:07:53 Writing NN index file to temp file /tmp/Rtmpm6Tyoz/file59f2857ba4f55
15:07:53 Searching Annoy index using 4 threads, search_k = 3000
15:07:54 Annoy recall = 100%
15:07:55 Commencing smooth kNN distance calibration using 4 threads
15:07:56 Initializing from normalized Laplacian + noise
15:08:03 Commencing optimization for 500 epochs, with 330408 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
15:08:23 Optimization finished
Now let’s have a look at the clustering
First let’s extract the index of the barcodes and add it to the metadata
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
breast_cancer@meta.data$merged.ident <- paste("Slice_", substrRight(rownames(breast_cancer@meta.data), 1), sep = "")
Let’s have a look. It looks like there isn’t any large difference between the two slices that might make us want to do some sort of normilization for batch effects. There is a really nice paper on integration from a single cell perspective that was published on bioRxiv recently. MD Luecken et. al
DimPlot(breast_cancer, reduction = "umap", label = FALSE, group.by = c("ident", "merged.ident")) +
labs(color = "Cluster")
I don’t really like these colors so let’s change them
p1 <- DimPlot(breast_cancer, reduction = "umap", label = TRUE) +
labs(color = "Cluster")
p2 <- SpatialDimPlot(breast_cancer, label = TRUE, label.size = 3) +
labs(fill = "Cluster")
p1 + p2 + plot_annotation(
title = 'Clustering in UMAP and Tissue Space',
caption = 'Processed by Spaceranger 1.1\nNormilization and Clustering by Seurat'
) + plot_layout(nrow = 2)
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
p1 <- DimPlot(breast_cancer, reduction = "umap", label = TRUE) +
labs(color = "Cluster") +
scale_color_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green"))
p2 <- SpatialDimPlot(breast_cancer, label = TRUE, label.size = 3) +
labs(fill = "Cluster")
p1 + p2 + plot_annotation(
title = 'Clustering in UMAP and Tissue Space',
caption = 'Processed by Spaceranger 1.1\nNormilization and Clustering by Seurat'
) + plot_layout(nrow = 2)
Interactivity not working for me on firefox
LinkedDimPlot(breast_cancer)
First we’ll idetify differentially expressed genes.
Parallelization helps here too let’s make sure our plan is still intact
plan()
- call: plan("multiprocess", workers = 4) indicates that it is
Looks like we have some very DE genes for clusters 4 and 11
clarify what ident.1 = 4, ident.2 = 6 are for
de_markers <- FindMarkers(breast_cancer, ident.1 = 4, ident.2 = 6)
SpatialFeaturePlot(object = breast_cancer, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
what are the top variable features?
VariableFeatures(breast_cancer)[1:10]
[1] "IGKC" "IGLC2" "IGHG3" "MGP" "ALB" "IGHG1" "IGLC3" "IGHM" "IGHA1" "CRISP3"
what are the top de genes?
rownames(de_markers)[1:10]
[1] "LINC00645" "HLA-B" "HLA-C" "B2M" "HLA-A" "DEGS1" "COX6C" "IGFBP2" "PTMA" "MT-CO3"
So what about spatial enrichment?
Some methods 1. Trendsceek 2. Splotch 3. SPARK 4. SpatialDE + We have found this implimenton not to be very effective. It’s also not under active development
Using the top 100 variable genes find spatially enriched ones. Note that in the Seurat Spatial Tutorial they use 1000 genes. You can also use all genes but that will take a long time. Using a calucation of Morans I can sometimes be a faster approach, especially if you are using parallization.
This falls apart for merged data.
breast_cancer <- FindSpatiallyVariableFeatures(breast_cancer,
assay = "Spatial",
slot = "data",
features = VariableFeatures(breast_cancer)[1:100],
selection.method = "markvariogram", verbose = TRUE)
Error in FindSpatiallyVariableFeatures.default(object = data, spatial.location = spatial.location, :
Please provide the same number of observations as spatial locations.
Have a look at the spatially variable genes calculated by markvariogram ordered from most variable to least variable
SpatiallyVariableFeatures(breast_cancer, selection.method = "markvariogram", decreasing = TRUE)
top.features_trendseq <- head(SpatiallyVariableFeatures(breast_cancer, selection.method = "markvariogram"), 6)
SpatialFeaturePlot(breast_cancer, features = top.features_trendseq, ncol = 3, alpha = c(0.1, 1))
Moran’s I implamentation. For other spatial data types the x.cuts and y.cuts determins the grid that is laied over the tissue in the capture area. Here we’ll remove those
breast_cancer <- FindSpatiallyVariableFeatures(breast_cancer, assay = "SCT", slot = "scale.data", features = VariableFeatures(breast_cancer)[1:100],
selection.method = "moransi")
Error in FindSpatiallyVariableFeatures.default(object = data, spatial.location = spatial.location, :
Please provide the same number of observations as spatial locations.
Have a look at the spatially variable genes calculated by moransi ordered from most variable to least variable
SpatiallyVariableFeatures(breast_cancer, selection.method = "moransi", decreasing = TRUE)
top.features_moransi <- head(SpatiallyVariableFeatures(breast_cancer, selection.method = "moransi"), 8)
SpatialFeaturePlot(breast_cancer, features = top.features_moransi, ncol = 4, alpha = c(0.1, 1))
We can see that the results are slightly different. So let’s take a look at why.
spatially_variable_genes <- breast_cancer@assays$SCT@meta.features %>%
tidyr::drop_na()
spatially_variable_genes
You can see the two methods show
mm_cor <- cor.test(spatially_variable_genes$moransi.spatially.variable.rank, spatially_variable_genes$markvariogram.spatially.variable.rank)
ggplot(spatially_variable_genes, aes(x=moransi.spatially.variable.rank,y=markvariogram.spatially.variable.rank))+
geom_point()+
geom_smooth()+
xlab("Morans I Rank")+
ylab("Markvariogram Rank")+
annotate("text", x = 25, y = 75, label = paste("Pearson's Correlation\n", round(mm_cor$estimate[1], digits = 2), sep = ""))+
theme_bw()
10x Genomics, patrick.roelli@10xgenomics.com ↩︎
10x Genomics, stephen.williams@10xgenomics.com↩︎
10x Genomics, stephen.williams@10xgenomics.com↩︎